Process of analysis

It is not simple to decide how to address the multiple issues that can arise after the first run through the assumptions of the Rasch model. There is also often not just one way to obtain a scale that fits. Usually, obtaining a good fitting scale by preserving as much as possible its original features should be prioritized.

Typically, articles report a) the fit of a scale at start analysis with focus on the results when testing each of the Rasch assumptions. Next, they report b) the fit at the end of the calibration process, once all breaches to the assumptions have been addressed. How to get from a) to b) the most efficiently is not written down anywhere, however some general advises are available.

One general recommendation is to start with the assumption of local item dependency (LID). Strong LID can go along with multidimensionality and also cause item thresholds to not work properly. Depending of the extent of LID, the aggregation into testlets can solve the multidimensionality.

Once the questionnaire is free of LID and is found unidimensional, the interpretation of the reliability starts to make sense. Also, at this point the item fit statistics can be further investigated to make sure that all items or testlets work well to measure the construct.

Finally, the analysis of DIF is undertaken. Depending on the purpose of the questionnaire it is worthwhile to gather information on how the items works for different person subgroups or different assessment situations and make sure that DIF is not indicating any unfair treatment of some subgroups.

The complexity of the analysis and the number of models undertaken can challenge the clarity of the reporting of a Rasch analysis.

Schematizations can be helpful.

Example PTGI (Kunz 2019) - Conceptualization of the analysis approach

Also, instead of each little adjustment step and test, only the fit statics at start and in the final Rasch model can be shown in an analysis summary table.

Example WHODAS 2.0 (Chiu 2019) - Summarizing

Finalizing SRG

During the course following issues where found for the SRG-scale.

  1. Item Fit (MS3): Misfit in item SRG15 I learned that there are more people who care about me than I thought (Outfit = 1.622; Infit = 1.421)
  2. Targetting and Reliability (MS4) : OK
  3. Threshold Ordering (MS5): OK
  4. Local Item Dependencies (MS6-MS7): no LID > 0.2 (LID with the cut-off from Christensen)
  5. Multidimensionality (MS8-MS10): OK
  6. Differential Item Functioning (MS11-MS12): SRG8 I learned to be a more confident person for gender.

In the exercise of seminar 7, LID-continued, creating a testlet for SRG15 and SRG13 resulted in bad fit. Exceptionally, I would suggest to remove SRG15 from the scale. Usually, one would be cautious with the deletion of items, especially when testing the metric properties of a scale that is already established in the practice and research. When developing a new scale, when items are selected for a scale, the deletion of misfitting items is not as problematic.

Also, with regard to DIF, let’s assume that we want to come up with one metric for the entire SCI sample, and that the systematic differences for gender in item SRG8 is not understood as a “favoritism” for one of the subgroup. In summary, also for keeping the example more simple, let’s not split the item and keep just one difficulty estimate for SRG8.

First load the data but remove item SRG15:

urlfile = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2021_Rasch/main/Data/SRG_Data_Course_UZH_HS2021.csv"

srg.data=read.csv(url(urlfile))

dim(srg.data)
## [1] 450  26
colnames(srg.data)
##  [1] "X"                      "ID"                     "Age"                   
##  [4] "Gender"                 "Completeness"           "para.tetra_1"          
##  [7] "traumatic_nontraumatic" "PersStat"               "SRG1"                  
## [10] "SRG2"                   "SRG3"                   "SRG4"                  
## [13] "SRG5"                   "SRG6"                   "SRG7"                  
## [16] "SRG8"                   "SRG9"                   "SRG10"                 
## [19] "SRG11"                  "SRG12"                  "SRG13"                 
## [22] "SRG14"                  "SRG15"                  "TP"                    
## [25] "ID_Unik"                "wgt"
srg.items=c("SRG1", "SRG2",  "SRG3",  "SRG4",  "SRG5",  "SRG6",  "SRG7",  "SRG8",  "SRG9", "SRG10", "SRG11", "SRG12", "SRG13", "SRG14") # minus "SRG15"

# dataset with  SRG items and the person factors
data.srg = srg.data[,srg.items]

Now, the analysis will be run again without the item SRG15

library(eRm)
library(iarm)

PCM.srg.2 = PCM(data.srg[,srg.items], sum0 = TRUE)

#Targeting and reliability
scale.properties = test_prop(PCM.srg.2)
Separation Reliability        Test difficulty            Test target 
             0.8975534              0.0350000              0.0970000 
      Test information 
             5.7793920 
scale.properties[c(1,3)]
Separation Reliability            Test target 
             0.8975534              0.0970000 
#Person-Item Map
plotPImap(PCM.srg.2, sort = TRUE, main = "SRG-metric")

PP.srg.2 = person.parameter(PCM.srg.2)
resid.srg.2 = residuals(PP.srg.2)

#Item Fit
eRm::itemfit(PP.srg.2)

Itemfit Statistics: 
        Chisq  df p-value Outfit MSQ Infit MSQ Outfit t Infit t Discrim
SRG1  403.527 430   0.816      0.936     0.975   -0.930  -0.382   0.554
SRG2  481.746 430   0.043      1.118     1.115    0.996   1.537   0.481
SRG3  479.234 429   0.047      1.114     1.079    1.593   1.302   0.543
SRG4  383.938 430   0.946      0.891     0.926   -1.538  -1.243   0.618
SRG5  420.058 432   0.651      0.970     1.017   -0.340   0.306   0.586
SRG6  401.453 432   0.851      0.927     0.917   -0.915  -1.369   0.628
SRG7  344.610 429   0.999      0.801     0.829   -3.075  -2.973   0.664
SRG8  396.960 432   0.886      0.917     0.938   -0.925  -0.957   0.606
SRG9  397.358 432   0.883      0.918     0.923   -1.236  -1.297   0.602
SRG10 378.226 430   0.966      0.878     0.878   -1.933  -2.040   0.598
SRG11 384.962 432   0.949      0.889     0.897   -1.418  -1.726   0.647
SRG12 401.636 427   0.806      0.938     0.961   -0.538  -0.543   0.583
SRG13 514.941 432   0.004      1.189     1.110    2.499   1.708   0.384
SRG14 362.936 432   0.993      0.838     0.855   -2.446  -2.522   0.655
#LID - local item dependencies
cor.resid.srg.2 = cor(resid.srg.2, use = "pairwise.complete.obs")

cor.resid.srg.2.tri = cor.resid.srg.2
cor.resid.srg.2.tri[upper.tri(cor.resid.srg.2.tri, diag = TRUE)] = NA

which(cor.resid.srg.2.tri > 0.2, arr.ind = TRUE)
     row col
#PCA  eigenvalues 
eigen(cor.resid.srg.2)$values
 [1] 1.82693212 1.67990879 1.47913292 1.23935187 1.17468146 1.05375863
 [7] 0.96486843 0.89601162 0.86659402 0.84033068 0.72041348 0.67329405
[13] 0.57098978 0.01373214
#thresholds
thres_map_fct = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2020_Rasch/master/RFunctions/threshold_map_fct.r"

source(url(thres_map_fct))

ThresholdMap(thresholds(PCM.srg.2))

Deleting of SRG15 resulted in :

  1. Item Fit : good item fit for all remaining items if using infit and outfit < 1.2
  2. Targetting and Reliability : OK, PSI = 0.8976, test targeting = 0.097
  3. Threshold Ordering : OK
  4. Local Item Dependencies : OK with no LID > 0.2
  5. Multidimensionality: OK, 1st eigenvalue < 2
  6. Differential Item Functioning : not applicable

Transformation Table

In principle, once the scale has been calibrated with the Rasch model, a transformation table is created which links the rows scores of the final scale to the related-logit scored ability estimates and the logit scores to user-friendly rescaled scores. The range of the user-friendly score is typically from 0 to 100, which would allow to express scores in percentage of the maximum score that can be obtained. A transformed range from 0 to 100 makes only sense if the original range is already large. Like one would not rescale from 0 to 100 if the original instrument score range is very small. Otherwise, another convenient score range is selected.

library(scales)

names(PP.srg.2)
 [1] "X"           "X01"         "X.ex"        "W"           "model"      
 [6] "loglik"      "loglik.cml"  "npar"        "iter"        "betapar"    
[11] "thetapar"    "se.theta"    "theta.table" "pred.list"   "hessian"    
[16] "mpoints"     "pers.ex"     "gmemb"      
T.Table = as.data.frame(cbind(PP.srg.2$pred.list[[1]]$x, PP.srg.2$pred.list[[1]]$y))
colnames(T.Table) = c("Row Score", "Logit Score")

#create a rescaled Rasch-Score in a convenient range, here from 0 to 100
Transformed_Score = scales::rescale(T.Table[,2], to = c(0, 100))

T.Table = cbind(T.Table, Transformed_Score)
colnames(T.Table) = c("Row Scores", "Logit Scores", "0-100 Scores")


#round to the second decimals of the two last columns
T.Table[,c(2,3)] = round(T.Table[, c(2,3)], 2)


T.Table
   Row Scores Logit Scores 0-100 Scores
1           0        -4.47         0.00
2           1        -3.57         9.66
3           2        -2.74        18.53
4           3        -2.22        24.12
5           4        -1.83        28.32
6           5        -1.51        31.75
7           6        -1.23        34.68
8           7        -0.99        37.29
9           8        -0.77        39.67
10          9        -0.56        41.87
11         10        -0.37        43.95
12         11        -0.18        45.94
13         12         0.00        47.86
14         13         0.18        49.75
15         14         0.35        51.60
16         15         0.52        53.46
17         16         0.70        55.32
18         17         0.87        57.21
19         18         1.05        59.15
20         19         1.24        61.17
21         20         1.44        63.29
22         21         1.65        65.55
23         22         1.88        68.01
24         23         2.14        70.76
25         24         2.43        73.93
26         25         2.79        77.78
27         26         3.27        82.88
28         27         4.04        91.07
29         28         4.87       100.00

Computer Adaptive Testing

An interesting application for Rasch (and IRT) derived parameter is found in computer adaptive testing. Computer-based testing is a broad field which includes linear and adaptive testing.

In linear testing, the same number of test questions are administered in a same order to all respondents. Linear testing is similar to a standard paper-based test. Additionally, to paper-pencil, the computer can immediately process the responses and compute the respondents’ score.

Adaptive testing is a type of testing where the scale adjusts to the ability of the respondent. The questions that a respondent receives are selected based on the past responses. In that sense, the test adapts to the response pattern and the ability of the respondent. The goal of the CAT is to select items that reduce the standard error of measurement and help to obtain a stable estimate of the ability. Typically, when the ability estimate varies within a small margin of error, the test can be stopped. Computer-adaptive testing offers several advantages such as shortening the time for test delivery and immediate score reporting to candidates.

Boston University: School of Public Health

Boston University: School of Public Health

In R, the package mirtCAT from the same authors as mirt allows adaptive testing, using the item parameter issued from mirt or any other IRT-software (manual entry of the difficulty parameter). The package mirtCAT can also be used for multidimensional testing. While other packages for CAT would be available in R, mirtCAT allows to generate a user-friendly interface to administer a CAT.

Building an interface for a CAT requires:

  • Having a suitable pool of items (or item bank) which has been calibrated for the population of interest.
  • Initializing the CAT session : a starting rule for selecting the first item (start_item =).
  • Selecting the next item to administer: a rule on how to decide which is the next best item… (criteria =)
  • Selecting the IRT-Scoring method: how to estimate and update the person ability (method =)
  • Terminating the application: a rule to decide when enough information has been collected… (design = list())

The values that these settings can take can be found when typing ?mirtCAT.